a soft segment modeling approach for duration modeling in phoneme recognition systems
Authors
abstract
the geometric distribution of states duration is one of the main performance limiting assumptions of hidden markov modeling of speech signals. stochastic segment models, generally, and segmental hmm, specifically, overcome this deficiency partly at the cost of more complexity in both training and recognition phases. in this paper, a new duration modeling approach is presented. the main idea of the model is to consider the effect of adjacent segments on the probability density function estimation and evaluation of each acoustic segment. this idea not only makes the model robust against segmentation errors, but also it models gradual change from one segment to the next one with a minimum set of parameters. the proposed idea is analytically formulated and tested on a timit based context independent phoneme classification system. during the test procedure, the phoneme classification of different phoneme classes was performed by applying various proposed recognition algorithms. the system was optimized and the results have been compared with a continuous density hidden markov model (cdhmm) with similar computational complexity. the results show slight improvement in phoneme recognition rate in comparison with standard continuous density hidden markov model. this indicates improved compatibility of the proposed model with the speech nature.
similar resources
Allophone-based acoustic modeling for Persian phoneme recognition
Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...
full textNear-miss Modeling: a Segment-based Approach to Speech Recognition Near-miss Modeling: a Segment-based Approach to Speech Recognition
Currently, most approaches to speech recognition are frame-based in that they represent speech as a temporal sequence of feature vectors. Although these approaches have been successful, they cannot easily incorporate complex modeling strategies that may further improve speech recognition performance. In contrast, segment-based approaches represent speech as a temporal graph of feature vectors a...
full textImproving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM
Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Mos...
full textEvaluation of Soft Segment Modeling on a Context Independent Phoneme Classification System
ةــصلاخلا : ُـي ةيساسلأا تاضارتفلاا نم ةلاحلا دادتملا يسدنهلا عيزوتلا ربتع ةراشلأل فوآرام ةجذمن ءادأ نم د ُّ حت يتلا ةيتوص لا . ة يعباتتلا ءاز جلأا جذو منأ نإ ف ،مو معلا ى لعو – تا يئزج كلذ آو ،ةيئاوش علا HMM ى لعو ، يف ةبوعص ةجرد يف ةدا يز ى لإ هرود ب يدؤ ي اً يئزج صقن لا اذ ه زواجتل ،صوصخلا روطلا د يدحتو بيرد ت . جذو من نمض توصلا تايئاصحلال يجيرد تلا ي نمزلا ر يغتلا جرد ن م ل ،ضار تفلاا اذ ه ى ل...
full textNear-miss modeling: a segment-based approach to speech recognition
Currently, most approaches to speech recognition are frame-based in that they represent speech as a temporal sequence of feature vectors. Although these approaches have been successful, they cannot easily incorporate complex modeling strategies that may further improve speech recognition performance. In contrast, segment-based approaches represent speech as a temporal graph of feature vectors a...
full textModeling duration patterns for speaker recognition
We present a method for speaker recognition that uses the duration patterns of speech units to aid speaker classification. The approach represents each word and/or phone by a feature vector comprised of either the durations of the individual phones making up the word, or the HMM states making up the phone. We model the vectors using mixtures of Gaussians. The speaker specific models are obtaine...
full textMy Resources
Save resource for easier access later
Journal title:
the modares journal of electrical engineeringPublisher: tarbiat modares university
ISSN 2228-527 X
volume 4
issue 1 2004
Hosted on Doprax cloud platform doprax.com
copyright © 2015-2023